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Plasmids are extra-chromosomal DNA molecules which code for their own replication. We previ- 
ously reported a setup using genes coding for fluorescent proteins of two colors that allowed us, using 
a simple model, to extract the plasmid copy number noise in a monoclonal population of bacteria [J. 
Wong Ng et al., Phys. Rev. E, 81, 011909 (2010)]. Here we present a detailed calculation relating 
this noise to the measured levels of fluorescence, taking into account all sources of fluorescence fluc- 
tuations: the fluctuation of gene expression as in the simple model, but also the growth and division 
of bacteria, the non-uniform distribution of their ages, the random partition of proteins at divisions 
and the replication and partition of plasmids and chromosome. We show how using the chromosome 
as a reference helps extracting the plasmid copy number noise in a self-consistent manner. 

PACS numbers: 87.18.Tt, 87.16.-b 
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I. INTRODUCTION 

Plasmids are highly common in natural bacterial 
strains and are widely used in studies of gene expres- 
sion pQ. They have been seen as a model for genomic 
replication and partition [TJ [5] and studied as genetic 
control systems, possibly subject to noise [3]. A number 
of technics have been used to measure plasmid copy num- 
bers (PCN). DNA titration is the simplest, but least pre- 
cise. Quantitative polymerase chain reaction (qPCR) [4 
is often used and gives access to mean PCN in a popula- 
tion. Two in vivo labeling techniques may a priori give 
access to PCN distributions when applied on single-cells: 
fusions of a fluorescent protein with a transcription factor 
that binds the plasmids 5, 6 or insertion of a gene coding 
for a fluorescent protein into the plasmids [7]. However 
both have limitations that prevent them from giving ac- 
cess to more than the mean PCN [Sj . 

In the remainder of this Introduction we briefly recall 
the setup of the experiments reported previously, making 
use of dual fluorescence reporters, that allowed us to infer 
the second moment of PCN distributions [5]. In Section 
In] we derive the expression for PCN mean and noise in 
a simple case, where only fluctuations of gene expression 
are considered. The realistic case, taking into account 
all sources of fluctuations of the actual experiment, is 
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experimentally measured quantities. These results and 
the principle of this work are then discussed. Appendixes 
present some computations in greater details. 

The gene egfp [5] , coding for the the green fluorescent 
protein EGFP, was fused to the inducible, strong pro- 
moter PtacI [10] and then inserted in the chromosome 
of an E. coli strain. The bacteria were then transformed 
with cither one of the four plasmids studied here, which 
contained the fusion PtacI-mOrange 11 j: we thus ob- 
tained strains expressing EGFP and the orange fluores- 
cent protein mOrange at the same time, under the same 
transcriptional control. After one hour induction with 
IPTG, all protein expression was blocked. Cells were in- 
cubated overnight so that all fluorescent proteins acquire 
their mature form. For each of the four strains, green 
and orange fluorescence intensities of individual cells were 
then measured. In each experiment at least 10,000 cells 
were observed, and at least three experiments were done 
in each condition. 

In general, disentangling the various contributions to 
the final distribution of fluorescence would be a diffi- 
cult problem. However, making some assumptions on 
the gene expression processes, we will be able to express 
the first and second moments of the number of fluores- 
cent proteins as functions of those of copy numbers and 
to inverse these relations to find how to relate the exper- 
imental measurements to the distribution of PCN. The 
next section presents this strategy in a simple case. 
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II. SIMPLE MODEL 

We suppose here that during the induction, bacteria 
do not grow, the plasmids and chromosomes do not repli- 
cate, the protein production does not depend on time [T^] 
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FIG. 1. (Color online) Cartoon of the lineage of a bacterium during protein production induction, here depicted with one 
division (only one of the two final cells is shown). Fluorescence intensities of single cells are measured at the end of induction. 
The orange, resp. green, intensities are proportional to the number of orange proteins Po, resp. green proteins Pg, in the 
observed cell, shown as orange (dark gray), resp. green (light gray), dots. These proteins were produced during all the induction 
by a varying number of mOrange or egfp copies (no and no) and randomly distributed among daughter cells at each division. 



and the age distribution of bacteria is uniform. 

We note P l a the contribution of the copy % of the gene a 
(a = O or G for the genes mOrange or egfp) to the total 
number of proteins P a at the end of induction in one cell 
and n a the number of copies of the gene a in that cell 
(see Fig. fTl). One can write: 



where P a and P D are evaluated in the same cell. 

In the case of different genes, we can suppose that the 
correlation does not depend on the particular copies con- 
sidered, nor on their numbers. Thus: 

(PoPg) = E P( n o,n G ) n n G £ p(P Q , P G ) P Q P G 
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The average (over the population) of P a can thus be writ- 
ten: 

n a i=l pi 

where p(n a , Pa) is the joint probability of n a and P*. We 
can suppose that the distribution of the number of pro- 
teins produced by each copy does not depend on the par- 
ticular copy considered nor on the number of copies (we 
measured the same distributions of green fluorescence, 
i.e. of expression from the chromosome, for strains bear- 
ing both high and low copy number plasmids [13] ). Thus: 



pi 



(n a ){P*). 



Moreover we can suppose that on average the number of 
proteins produced by a copy of a gene does not depend 
on the gene (both genes are under the same promoter). 
Hence, as expected: 
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(na) (Pg) ' 
The moments of order 2 can similarly be written: 

n a rij, 

(p a p b ) = E EE E p{n a ,n b ,piPt)PiPi. 

n a ,nt, z=l j — 1 pi pj 



(i) 



= (n n G )(P P G ). 

In the case of the same gene, we can suppose that 
two different copies correlate like two copies of different 
genes {(P^Pi) = (P P G ), V« ^ j) and that the auto- 
correlation of one copy does not depend on the partic- 
ular copy or gene considered (((-Pa) 2 ) = ((P 1 ) 2 )^ Va,i). 
Then: 

(Pi) = (n a )((P l f) + (n a (n a l))(P P G ). 

Combining those two last expressions with equation [T] 
we obtain: 
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Since the replication of the chromosome is well con- 
trolled [21 [Hj we can suppose that the variance of the 
chromosome copy number vanishes ((n G ) sa (n G ) 2 ) and 
that the plasmid and chromosome copy numbers are un- 
corrclatcd ({»o«g} ~ (no)( n G))- Let // be the PCN 
noise, defined by: rj 2 = ((n 2 ) — (no) 2 ) / (no) 2 ■ Then: 
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which, it turns out, does not depend on the chromosome 
copy number or any other external inputs, but solely on 
quantities directly measured in this experiment. 



III. COMPLETE MODEL 

We want now to also take into account sources of fluo- 
rescence fluctuation other than gene expression. We as- 
sume that all cells have exactly the same division time T. 



Two studies report a small variability of division times, 
with a standard deviation of the growth time constant 
of ~10% of the average Q21 HE]. We note t the age 
of a cell at the beginning of induction. Under this hy- 
pothesis, the distribution of ages to is exponential |17j : 
p(t ) = (21n2/T).2-*°/ T . We will also consider that the 
induction time (one hour) is a multiple of the division 
time. This is true at 30 and 37 °C, where we measured 
cell cycles of 1 h and 30 min respectively, but not for in- 
termediate temperatures (this is discussed in Section IV I . 



We will present calculations with cells dividing twice dur- 
ing the induction, i.e. a cell cycle of 30 min; more or less 
divisions only change the numerical pre- factors |18j . 

At each cell division, fluorescent proteins are randomly 
inherited by one of the two daughter cells, thus adding 
to the fluorescence fluctuations. As discussed in Ap- 
pendix [Aj this contribution turns out to be small: to 
a good precision, half of the fluorescent proteins are in- 
herited by each daughter cell. 

Following one lineage during the induction, we can now 
express the number of fluorescent proteins at the end of 
induction in a given cell: 
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where we took the age of the cell at the beginning of 
induction to as the initial time and introduced a a (i,t), 
the rate of protein production at time t from the copy i 
of the gene a [19] . 



A. Fluorescence averages 



bound its ratio to the mean copy number. We thus define 
1l a = ((1/T) J^dt oP (t Q ) Jo°dt(n a (t)))/(f^), and use it 
to express the mean PCN per chromosome: 
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We show in Appendix [C] that K a € [0.15,0.45]. We also 
computed it after postulating various shapes for (n a ) as 
a function of time and propose that this interval can be 
reduced to [0.36,0.44] (see Appendix IS]) . The results for 
the four plasmids we studied, at various temperatures, 



are presented in Section IV 



B. Fluorescence cross-correlations 

We will follow the same strategy for the correlations, 
namely bound terms related to plasmid or chromosome 
replication and partition. Beside those already men- 
tioned, we use the assumptions (v) and (vi) presented 
in Appendix [B] and introduce: 

Sab = i=^- =[ dtop(t )[ °dt[ °dt'(n a (t)n b (t')). 
{n a rib) ± Jo Jo Jo 

We can now write: 

(PoPg) = ^-T 2 (aoa G )(l+n +n G +SoG)(nd)(nG), 
16 

(5) 

where Po and Pq are evaluated in the same cell. We show 

in Appendix [C] that Sog G [0,0.45], and argue that this 

interval can be reduced to [0.20,0.28] (see Appendix [d|. 



To compute the average of P a we introduce the joint 
probability p\to,n a ,a a \, which is now a functional and 
the integral is performed over all possible n a and a a func- 
tions: 

(P a ) = dto / D[n a ]'D[aa}p[to,n a ,a a )Pa[to,na,aa}- (3) 

A number of assumptions on gene expression and repli- 
cation, similar to those presented in the Section In] are 
detailed in Appendix [B] We use here the hypotheses (i) 
to (iv) to simplify Eq. k5] without having to postulate ex- 
plicit models for gene expression or replication. We then 
find: 
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dt p(t ) [°dt(n a (t)) 



where • is the average over one cycle, which commutes 
with the average over the population. 

In general we cannot inverse this relation so as to ex- 
press the average copy number as a function of the av- 
erage protein number and we do not know the plasmid 
replication systems well enough to evaluate the second 
term in the parentheses. It is nevertheless possible to 



C. Fluorescence auto-correlations 

We consider now the moment of order 2 for the same 
gene, i.e. (P„), with a = O or G. We make two more 
assumptions, (vii) and (viii) in Appendix [B] and note 
C a (\t — t'\) the auto-correlation function at two times t 
and t' of the rate of fluorescent protein production a. 

Our guess is that the results will not be affected by the 
particular form this auto-correlation function will take; 
to test it we will make two extreme hypotheses: (A) of 
very short "memory" , (B) of infinite (over the whole in- 
duction time) "memory" . 

In the hypothesis (A), we suppose that after a very 
short time r the expression of a copy of a gene corre- 
lates with itself the same way it correlates with other 
copies. This makes sense if r is small compared to the 
replication time; and indeed, we expect a particular copy 
auto-correlation to stem from multiple translations of a 
given mRNA, which has a typical life time of the order 
of the minute in bacteria, or from transcriptional bursts, 
which were shown to happen over short time scales [20j . 
In contrast, genes are on average replicated once per cell 
cycle, i.e. every few tens of minutes. 



We consider in this hypothesis that C a is a peaked 
function at 0, with a non-zero value beyond a small time 
t such that it does not depend on wether a previous copy 
was the ancestor of the considered copy or not : 

C^(\t-t'\) = (a 2 )xr6(t~t') 

+ (a a G ){l-T5(t-t'))-{a)\ 

This gives: 

{P 2 a )A = ^T 2 {a a G ){l + 21l a +S aa )((n- a ) 2 ) 
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(aoac»(l+3fta)(rfc>. (6) 

In the hypothesis (B), we suppose that C a is constant: 

C*(\t-t>\) = (a 2 )-(a) 2 . 

(We expect the actual form of C a to be intermediate be- 
tween those two, namely a smooth declining function on 
a time scale of a few minutes.) The hypothesis (B) is less 
realistic. It could correspond to mutations distinguishing 
different copies of a given gene. By noting that at any 
previous time each copy has exactly one ancestor, this 
translates in: 

n a (t) ri a (t') 

y^ ^(a (i,t)« («' ',*'))b = (aooiG)n a (t)n a (t') 

+ ((a 2 )-(a a G ))(n a (t)9(t-t')+n a (t')9(t'-t)), 

where is the Heaviside function. 

We then introduce a third quantity, T a , which is de- 
fined in Appendix O and can be shown to lay in the 
interval [0,9.9]. (We will argue in Appendix [d] that this 
interval can be reduced to [5.7,6.1].) Then: 
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(7) 



The two hypotheses (A) and (B) thus only lead to dif- 
ferent factors for the contribution of the average copy 
number [21]. This term is expected to be small, even in 
the hypothesis (B), where l+T a can be of the order of 10: 
the numerical pre- factor cancels it, one can expect (a 2 ) 
and (ac-aG) to be of the same order of magnitude and, al- 
ready for the plasmid of lowest copy number and for the 
chromosome, (n£) is significantly smaller than ((n^) 2 ). 
Moreover, if we let r tend to the time of induction 2T, 
we recover terms of the same order of magnitude, thus 
suggesting a low sensitivity to the actual mathematical 
translation of the hypotheses. The results will be pre- 
sented and discussed with only the hypothesis (A), the 
more realistic, being considered; full computations with 
test functions confirmed that very close values for the 
PCN noise were found in the hypothesis (B) (data not 
shown) . 
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FIG. 2. (Color online) Mean PCN per chromosome 

(no) I '(tig) , computed using Eq. B] and the measured aver- 
age protein numbers (Po) and (Pa)- Results are shown for 
cells grown at 30, 32, 35, 37 and 39 °C and for the four plas- 
mids studied here, from bottom to top: mini-F (red), mini- 
Rl-par + (blue), mini-Rl-par~ (green), mini-ColEl (magenta). 
The values obtained in three cases are plotted: with the sim- 
ple model (squares, solid line), with the complete model and 
test functions (upper and lower bounds of the interval: circles, 
dashed lines) or within a general analysis (upper and lower 
bounds of the interval: triangles, dotted lines). The mini-Rl 
plasmids used here have a synthetic, thermo-sensitive origin 
of replication, the control of which is inactivated at high tem- 
perature [231 [24] . 



IV. RESULTS AND DISCUSSION 

By combining Eq. [IJ [5] and [6] so as to eliminate the 
gene expression rates, and assuming that the replication 
of the chromosome is well controlled, we can now express 
the PCN noise: 
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Note that both the auto-correlation time r introduced 
previously and the cell cycle length T have also been elim- 
inated. Only terms related to replication and partition 
of genes, which we can bound, and the experimentally 
measured moments of protein numbers remain |22| . By 
making the conservative assumption that lZ a and S a b can 
independently take any value in their intervals, we can 
compute intervals in which the mean PCN per chromo- 
some and the PCN noise are surely. They are presented 
in Table IT] for experiments at 37 °C, and in the Fig. [2] 
and Fig.[3jfor various temperatures. We report both the 
intervals estimated with a general analysis and with a 



TABLE I. Mean PCN per chromosome {no)/{nc;) and PCN noise r\ computed with data from experiments at 37 °C, using the 
simple model or the complete one, either with a set of test functions or within a general analysis. Only the hypothesis (A) of 
short "memory" was considered. We assumed that cells divided twice during the induction. 





mini-F 


mini-Rl-par 


mini-Rl-par + 


mini-ColEl 


(no)/(nc) simple 

(no) /{no) complete/test 

{no)/{nG) complete/general 


0.9 

[0.84, 0.95] 
[0.71, 1.13] 


7.2 
[6.8, 7.7] 
[5.7, 9.1] 


6.5 

[6.1, 6.9] 
[5.1, 8.2] 


87 
[82, 93] 
[69, 110] 


rj x 10 2 simple 

77 x 10 2 complete/test 

77 x 10 2 complete/general 


58 
[50, 67] 
[0, 100] 


36 

[25, 45] 

[0, 74] 


30 

[16, 39] 

[0, 71] 


28 
[13, 38] 

[0, 68] 
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FIG. 3. (Color online) PCN noise r\ computed using Eq. [8] 
and the measured average protein numbers (Po), (Pg), and 
protein number correlations (Po), (Pg)i (PoPg)- Results are 
shown at various temperatures for the four plasmids studied 
here (see the caption of Fig. [21. We considered that cells 
divided once during the induction at 30 and 32 °C, twice at 
35, 37 and 39 °C. Only the hypothesis (A) of short "memory" 
was considered. The results obtained with the simple model 
are fully recovered if we suppose a similar behavior for the 
plasmids and for the chromosome, see the main text. 



set of test functions for the moments of copy numbers. 
Values computed with the simple model are also shown. 
Both for means and noises, the values computed with the 
simple model fall in the middle of the intervals computed 
with the more realistic model. 

As Fig. [2] shows, we can clearly distinguish the plas- 
mids by their mean PCN per chromosome. Moreover, 
these results agree with previous, independent estimates, 
as discussed [8]. For the noises the picture is less clear. 
In the general study, the intervals found are too large 
for the results to be meaningful; but we know that we 
have largely overestimated them. In contrast, using test 
functions allows one to distinguish the plasmids by their 
PCN noises. In particular, we can notice that the par- 
tition system reduces the noise (compare mini-Rl-par + 



and mini-Rl-par~), and that a plasmid with a high copy 
number (mini-ColEl) has a lower noise than a plasmid 
with a small copy number (mini-F), even though it has 
a partitioning system [25] . 

We tested the quality of the inference with simple com- 
puter simulations, where stochastic gene expression and 
plasmid replication were implemented (see Appendix [El 
for more details). Table [TT] compares the true and inferred 
values of the mean PCN per chromosome and the PCN 
noise in four cases, corresponding to different assump- 
tions on the age and cell cycle duration distributions. In 
each case we find a very good agreement. 

As it appears in Eq. [4| [5] and [6j what we call here 
"plasmid copy number" , or "chromosome copy number" , 
is precisely the average over one cell cycle of the number 
of copies of the gene coding for a fluorescent protein. A 
quantitative PCR (qPCR) measures (no)q/("G)q> where 
(n a ) q = (Jdt p(t )n a (t Q )) = J dt Q p(t ) (n a (t )) ■ This 
quantity and the ratio (no) / ( n G) reported here take in 
general different values. We have indeed noticed a dis- 
crepancy between the two approaches, but other expla- 
nations are likely [8]. 

We have made strong, but reasonable hypotheses on 
gene expression. We made intuitive notions explicit and 
gave them a well defined mathematical translation. 

A deeper mathematical analysis could reduce signif- 
icantly the general intervals found, but not below the 
intervals found with test functions. Here again, the ex- 
perimental approach and derivation of the PCN mean 
and noise are self-consistent: there are no external in- 
puts, even in the bounded "correction" quantities lZ a or 
S a b, which depend only on the way the genes are repli- 
cated and inherited by daughter cells at divisions. Using 
the chromosome as a reference allowed us to get rid of 
global fluctuations: the number of divisions considered 
do not affect the results, fluctuations from proteins par- 
tition at division are suppressed, all fluctuations of gene 
expression are cancelled, and even the division time does 
not appear in the final result. This argues for the as- 
sumptions that the induction time is a multiple of the 
division time and that the variability in division times 
can be neglected not to affect the results. The simula- 
tions further confirm the robustness of this strategy, in 
particular the values inferred with the crudest assump- 
tions ("simple model") are strikingly close to the true 



TABLE II. Test of the inference method with computer simulations, in four cases: 1. a synchronized population of bacteria 
with fixed division time, equal to half the induction time; 2. as 1. with an exponential age distribution; 3. as 2. with a 
distribution of division times, with mean equal to half the induction time; 4. as 3. with the mean division time equal to one 
third of the induction time. 



(no)/{nc:) true 
(no)/(n G ) simple 
(no) /{no) complete/test 
(no) I {no) complete/general 



case 1 



case 2 



case 3 



case 4 



11.9 

12.8 
[12.0, 13.6] 
[10.1, 16.1] 



9.6 
10.1 

[9.5, 10.7] 
[8.0, 12.7] 



10.1 

11.0 
[10.4, 11.6] 
[8.7, 13.8] 



10.1 

11.7 
[11.0, 12.4] 
[9.3, 14.7] 



W X 10 true 

r\ x 10 2 simple 

77 x 10 2 complete/test 

77 x 10 2 complete/general 



63 
60 

[54, 67] 
[0, 93] 



66 

66 
[59, 72] 
[19, 98] 



75 

74 
[68, 80] 
[35, 106] 



75 
83 

[77, 89] 
[47, 114] 



ones, both for the mean PCN per chromosome and the 
PCN noise (see Table Q. 

The only source of uncertainty that remains stems from 
the replication and partition of the plasmids and chro- 
mosome. The use of test functions suggests that it does 
not affect the results much. Moreover, if we suppose that 
both are similar, i.e. IZo ~ T^G and Soo ~ Sgg, we fully 
recover the simple model presented at the beginning and 
in the previous article [5] . 

The next obvious step would be to consider correla- 
tions between cells, which could in particular inform us 
on plasmids partition. Here however, we lack the in- 
formation on the lineage (which cells share a common 
induced ancestor) necessary to make a practical use of 
these quantities. 

The use of dual reporters to dissect sources of noise was 
first proposed and demonstrated in a simple framework: 
steady state of fully induced bacteria, with both reporters 
in as much a similar position as possible [26l |27] . Here 
we took a similar approach further, and made sense of 
an intuitive setup: by changing one element, namely the 
locus of insertion of the genes coding for fluorescent pro- 
teins, we were able to measure one particular source of 
noise. The analysis proposed here could serve as a model 
for other derivations of this strategy. 
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Appendix A: Partition of proteins at cell divisions 

Random partition of fluorescent proteins at cell divi- 
sions contributes only to the auto-correlation (fluctua- 
tions) of protein numbers. We suppose a binomial dis- 
tribution of the number of inherited proteins. In the 
case of two divisions, this leads to adding the term 



h{^k)(p«) to (pi 

the correction 



In turn, this translates to adding 
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(7 - 3K G ) - (7 - 3Ko) 
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to (Pq), while leaving (Pq) unchanged, in the expression 
of the PCN noise 77 in the hypothesis (A) [TS]. This term 
varies from — 0.2(Po) to 0.3(Po) when we independently 
vary TZo and IZq in the interval [0.15,0.45]. Thus, with 
an expected number of proteins above 10 (probably hun- 
dreds to thousands), this correction is very small com- 
pared to (Pq) and can be safely neglected. 



Appendix B: Assumptions on gene expression and 
replication 

We make a number of assumptions in order to simplify 
the expression of the moments of the number of pro- 
teins at the end of induction P a . They are detailed below: 

(i) The age at the beginning of induction to, the 
copy numbers of plasmid or chromosome n a , the rates of 
protein production a a are independent: the probability 
p\tQ,n a ,a a ] factorizes in p[£o]p[™a]p[aa]- This means 
that there is no growth or expression burden associated 
with the presence of the plasmids. The notion is still 
debated, and whereas we could extract a small inhibition 
of growth for the strain bearing the mini-Rl-par - 
plasmid, we did not see any systematic deviation: we 
measured comparable growth rates for all strains at a 
given temperature; moreover, all strains exhibited the 
same average green fluorescence (thus, the same average 
protein production) |13j . 

(ii) During the induction, the average protein pro- 
duction rate does not depend on time, on the particular 
copy considered or on the gene, egfp or mOrange: 

(a a (i,t)) = (a), V a,i,t 



Similarly, the correlations of two different copies do not 
depend on time, on the copies or on the genes: 

(a a (i,t)a b (j,t')) = (a a G ), V a,b,i ^ j,t,t' 

We assume that the protein production rates immedi- 
ately reach a stationary state (but this has not to hold 
for the proteins concentrations): Elf et al. have shown 
that, at 1 mM IPTG, the fraction of Lad bound to the 
Lac promoter reaches its steady state value (zero) in less 
than 10 s [28]. Whereas the promoter we used, PtacI, is 
slightly different, there is no reason for the dynamics to 
be slower [25] . 

(in) The dynamics of expression of egfp and mOr- 
ange are essentially fixed by the promoter; since it 
is the same for both genes, any copy of any of the 
two will follow the same statistics. Any systematic 
difference in the translation rate (mRNA lifetime, codon 
usage) is incorporated in the fluorescence per molecule 
factor. Since the cells are incubated overnight, with 
chloramphenicol blocking protein production, we expect 
all fluorescent proteins to have acquired their mature 
form. Lastly, both genes showed the same distribution 
of fluorescence when inserted in the chromosome |5] • 

(iv) The gene copy numbers have reached a steady 
state and that there are no active loss of plasmid during 
the cell cycle or systematic bias in the way plasmids are 
inherited by daughter cells upon division: on average 
the plasmid and chromosome copy numbers are periodic 
of period T and (n a (T)) = 2{n a (0)). 

(v) On average the cross-correlations of the rates 
of production of proteins and of the copy numbers of two 
different genes do not depend on the particular copies 
considered nor on time for the first (we discuss different 
forms of the rates auto-correlation below and show that 
they do not affect our results much). 

(vi) The chromosome replication and partition are 
perfectly controlled [H [T3] : 

(n G {t)n G (t')) = (n G (t)){n G (tf)). 



(vii) We approximate the plasmid copy number auto- 
correlation function by a constant: 



(n (t)n (t')) - {no(t))(no(t')) 



(-'no ! V^J * • 



This implies that (no(t)no(t')) is periodic in each of its 
arguments and allows us to transform the integrals over 
the induction time to integrals over one cell cycle. In 
the absence of a consensus model for plasmid replication 
or independent measurements, we cannot gauge a priori 
the error we thus make. Note however that C no will not 
appear in the result. 



(viii) During the induction, the auto-correlation of 
the expression of a given copy does not depend on the 
gene considered, on the particular copy or on the time, 
but solely on the difference between two times: 

(a a (i,t)a a (i',t')}-{a} 2 = C a (\t-t'\), 

where i' is the ancestor of i. This follows from the same 
arguments as given above (e.g. the dependency on \t — 
t'\ follows from the assumption that the rate of protein 
production reached its steady state). 



Appendix C: Estimation of lZ a , S a b and Ta 

We briefly outline here the steps allowing us to bound 
lZ a , Sab and T a - Full derivations can be found in [IB] . 
We define: 



K a 



1 1 



-Xr / dt oP (t ) / dt(n a (t)). 
{n a ) T J Q J 



We linearize the age distribution: p(to) = 
{2\n2/T).2- to / T « (21n2/T)(l - In2.t /T). There 
exists to € [T/2, T] such that 



n a 



1 2 In 2 



1 - In 2 



dt / dt(n a (t)). 



We can suppose that (n a ) is increasing on [0,T[. It 
follows that fydt f*°dt(n a {t)) < 1/2. Recalling that 

(n a ) — (n^), this implies also: L dt J °dt (n a (t)) > 
(n„(0))/(2{fC)) and > (r^)/{2(n a {T))). At steady state 
(n a (T)) = 2(n a (0)). Thus, from the preceding inequali- 
ties: 

K a e [0.15,0.45]. 

In the same way, linearizing p(to) and showing that 
S a b can be expressed in terms of the integral of a convex 
function, we find S a b € [0,0.45]. 

In the case of two divisions, T a is defined following: 



Ta 



1 1 

^2T2 



dt p(t ) [7 dtt + 27 dtt 



pT pto pto \ 

-3io/ dt + 16T dt-3t dt)(n a (t)). 



We follow the same steps as before, only here we con- 
sider that each term can vary independently, thus highly 
overestimating the bounds for T a - We find T a € [0,9.9]. 



Appendix D: Test functions 

To gauge the quality of the previous estimates and fix 
minimal intervals, we computed TZ a , S a b and T a after 



postulating different shapes for the functions (n a ) and 
(n a n b )- 

The changes of variables t — > t/T and t Q — > t /T, and 
the normalization (n a ) — > (n a ) / (n a (0)} leave 1Z a and T a 
unchanged. We can thus limit ourselves to increasing 
functions on [0,1], going from 1 to 2. We considered 
step, sigmoid, exponential, logarithmic, sinus functions 
and monomials of various degrees. Each type is defined 
by one or two parameters: each parameter was given six 
values in the first case and four in the second case. 

For S a i,, we considered the product of any two func- 
tions among those above. (This implies (n a (T)rn,(T)) — 
4(n a (0)n D (0)), which in general is not true.) 

We used the exact expression of p(t ). We found: 

ft* est G [0.36, 0.44]; <S^ st € [0.20,0.28]; 7^ cst € [5.7,6.1]. 

Thus the interval found in general for lZ a is rather good, 
whereas these results seem to confirm that the intervals 
found for S a b and T a were highly overestimated in the 
previous analysis. 



Appendix E: Simulations 

To test the inference method proposed in this article, 
we simulated roughly the experiment and compared the 
inferred quantities to the true ones. The results are pre- 
sented in Table [TTJ We introduce here simple models of 
gene expression and replication, but recall that no such 
models are assumed in the inference method. The tran- 
scription of each reporter gene, the translation of the cor- 
responding RNAs and their degradation, and the repli- 
cation of the plasmids are each implemented as single 
stochastic reactions [30] : 



Gene a + Inducer 

mRNA a 

mRNA a 

Gene a 



Gene a + Inducer + mRNA a 

mRNA Q + Protein a 



2 Gene a 



where a again refers to either the plasmid or the chro- 
mosome. We follow here a simplified reactions scheme 
and ignore the polymerase binding to the promoter, the 
formation of an open complex and initiation of transcrip- 
tion, the binding of ribosomes on mRNAs, as well as the 



formation of complexes with RNases before mRNAs' de- 
cay. We used kinetic parameter values such as to find 
the same protein number distributions (expressed from 
the chromosome) as with the full reaction scheme simu- 
lated in |26) . and Swain et al. found a good agreement 
between the two schemes: the parameters of the simple 
scheme effectively catch these underlying processes. 

Noise sources related to the elongation of mRNAs and 
proteins (the fact that they are not produced in single 
steps) are also ignored. They would include the traffic 
of polymerases or ribosomes, pauses in transcription or 
translation, etc. We guess that they could similarly be 
accounted for in the rate constants and thus would not 
qualitatively change the results. 

The rates of transcription and translation are the same 
for both reporters. We impose the number of chromo- 
somes to go from 1 to 2 at 40% of the cell cycle. The 
number of inducers is also imposed: it is until some 
time tiag, and 1 until the end of the simulation. 

The duration of the cell cycle T is either fixed (cases 1 
and 2) or drawn from a gamma distribution with param- 
eters chosen so that the average division time is 1800 or 
1200 s, and typical variations of 10% (cases 3 and 4) |31j . 
The age of the cell at the beginning of the simulation is 
either (case 1) or drawn from the exponential distribu- 
tion indicated in the main text. The time £i ag has been 
arbitrarily fixed at 10 times the mean cell cycle duration. 
The induction, i.e. the remainder of the simulation, has 
always been taken to last 3600 s. 

We follow, in each simulation, one lineage. Upon cell 
division, the RNAs, proteins and plasmids are randomly 
picked to stay (their numbers drawn from a binomial dis- 
tribution). 100,000 simulations were performed in each 
of the four cases. 

Importantly, there is no control of the plasmid copy 
number: we simply fix the rate of replication to the 
growth rate In2/T, so that the plasmids are on average 
replicated once per cell cycle. A steady state of plasmid 
copy number is thus not reached, contrary to the more 
realistic assumption made earlier; the fact that we can 
still recover the mean PCN and the PCN noise shows 
that even this assumption is not critical. 

Each cell has 10 plasmids at the beginning of the sim- 
ulation. The cell cycle average • is taken during the first 
full cycle of induction. 

All codes are available upon request. 
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